SLIP: Self-supervision Meets Language-Image Pre-training
نویسندگان
چکیده
Recent work has shown that self-supervised pre-training leads to improvements over supervised learning on challenging visual recognition tasks. CLIP, an exciting new approach with language supervision, demonstrates promising performance a wide variety of benchmarks. In this work, we explore whether can aid in the use supervision for representation Vision Transformers. We introduce SLIP, multi-task framework combining and CLIP pre-training. After pre-training, thoroughly evaluate quality compare both under three distinct settings: zero-shot transfer, linear classification, end-to-end finetuning. Across ImageNet battery additional datasets, find SLIP improves accuracy by large margin. validate our results further experiments different model sizes, training schedules, datasets. Our findings show enjoys best worlds: better than self-supervision (+8.1% accuracy) (+5.2% accuracy). code is available at: github.com/facebookresearch/SLIP .
منابع مشابه
Language Acquisition Meets Language Evolution
Recent research suggests that language evolution is a process of cultural change, in which linguistic structures are shaped through repeated cycles of learning and use by domain-general mechanisms. This paper draws out the implications of this viewpoint for understanding the problem of language acquisition, which is cast in a new, and much more tractable, form. In essence, the child faces a pro...
متن کاملOn the Iranian In-service and Pre-service Language Teachers’ Perceptions of Educational Supervision Concerning their Professional Development
Teacher supervision plays a pivotal role in the improvement of education system and the way in which teachers and student teachers perceive it. Consequently language teacher supervisors can utilize appropriate supervisory models to keep teachers update and promote them professionally. The present study investigated the role of language teacher supervisors in student teachers and in-service teac...
متن کاملClinical supervision training across contexts.
BACKGROUND Clinicians require specific skills to teach or supervise students in the workplace; however, there are barriers to accessing faculty member development, such as time, cost and suitability. The Clinical Supervision Support Across Contexts (ClinSSAC) programme was designed to provide accessible interprofessional educator training to clinical supervisors across a wide range of clinical ...
متن کاملMinimal Supervision for Language Learning
A fundamental step in sentence comprehension involves assigning semantic roles to sentence constituents. To accomplish this, the listener must parse the sentence, find constituents that are candidate arguments, and assign semantic roles to those constituents. Each step depends on prior lexical and syntactic knowledge. Where do children begin in solving this problem when learning their first lan...
متن کاملLanguage Generation with Recurrent Generative Adversarial Networks without Pre-training
Generative Adversarial Networks (GANs) have shown great promise recently in image generation. Training GANs for text generation has proven to be more difficult, because of the non-differentiable nature of generating text with recurrent neural networks. Consequently, past work has either resorted to pre-training with maximumlikelihood or used convolutional networks for generation. In this work, ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Lecture Notes in Computer Science
سال: 2022
ISSN: ['1611-3349', '0302-9743']
DOI: https://doi.org/10.1007/978-3-031-19809-0_30